A New Parallel Partition Algorithm for Parallel Suffix Tree Construction
نویسندگان
چکیده
The suffix tree is a compacted trie of all suffixes of a given string. It is a fundamental data structure in a wide range of domains such as text processing, data compression, computer vision, computational biology, and so on [1]. Moreover, it can be used for network researches such as web analysis, which has been studied actively [2], [3]. For example, suffix trees have been utilized to effectively search for genomic DNA data in databases or text in the web. Recently, as parallel architectures such as distributed systems and CMPs improve, there have been studied developing parallel suffix tree construction algorithms practically. Chen and Schmidt [4] proposed a parallel algorithm for constructing suffix trees on a computational grid. Tsirogiannis and Kouds [5] proposed cacheconscious suffix tree construction algorithms that are tailored to CMP architectures. These algorithms take an approach of dividing suffixes into a number of partitions and constructing parallel the suffix tree for each partition. In this paper, we propose a new algorithm of partitioning suffixes. Basically, our algorithm is similar to that in [5], which partitions suffixes by prefixes of variable lengths. However, we use a trie instead of a hash table as an auxiliary data structure. Moreover, our algorithm can be easily parallelized to adapt to parallel architectures.
منابع مشابه
A Dynamic Approach to Weighted Suffix Tree Construction Algorithm
In present time weighted suffix tree is consider as a one of the most important existing data structure used for analyzing molecular weighted sequence. Although a static partitioning based parallel algorithm existed for the construction of weighted suffix tree, but for very long weighted DNA sequences it takes significant amount of time. However, in our implementation of dynamic partition based...
متن کاملDistributed suffix trees
We present a new variant of the suffix tree called a distributed suffix tree (DST) which allows for much larger databases of strings to be handled efficiently. The method is based on a new linear time construction algorithm for subtrees of a suffix tree. The new data structure tackles the memory bottleneck problem by constructing these subtrees independently and in parallel. It is designed for ...
متن کاملERA: Efficient Serial and Parallel Suffix Tree Construction for Very Long Strings
The suffix tree is a data structure for indexing strings. It is used in a variety of applications such as bioinformatics, time series analysis, clustering, text editing and data compression. However, when the string and the resulting suffix tree are too large to fit into the main memory, most existing construction algorithms become very inefficient. This paper presents a disk-based suffix tree ...
متن کاملParallel Construction of Minimal Suffix and Factor Automata
This paper gives optimal parallel algorithms for the construction of the smallest deterministic finite automata recognizing all the suffixes and the factors of a string. The algorithms use recently discovered optimal parallel suffix tree construction algorithms together with data structures for the efficient manipulation of trees, exploiting the well known relation between suffix and factor aut...
متن کاملA Simple Parallel Cartesian Tree Algorithm and its Application to Suffix Tree Construction
We present a simple linear work and space, and polylogarithmic time parallel algorithm for generating multiway Cartesian trees. As a special case, the algorithm can be used to generate suffix trees from suffix arrays on arbitrary alphabets in the same bounds. In conjunction with parallel suffix array algorithms, such as the skew algorithm, this gives a rather simple linear work parallel algorit...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012